76 research outputs found
Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS
There are increasing real-time live applications in virtual reality, where it
plays an important role in capturing and retargetting 3D human pose. But it is
still challenging to estimate accurate 3D pose from consumer imaging devices
such as depth camera. This paper presents a novel cascaded 3D full-body pose
regression method to estimate accurate pose from a single depth image at 100
fps. The key idea is to train cascaded regressors based on Gradient Boosting
algorithm from pre-recorded human motion capture database. By incorporating
hierarchical kinematics model of human pose into the learning procedure, we can
directly estimate accurate 3D joint angles instead of joint positions. The
biggest advantage of this model is that the bone length can be preserved during
the whole 3D pose estimation procedure, which leads to more effective features
and higher pose estimation accuracy. Our method can be used as an
initialization procedure when combining with tracking methods. We demonstrate
the power of our method on a wide range of synthesized human motion data from
CMU mocap database, Human3.6M dataset and real human movements data captured in
real time. In our comparison against previous 3D pose estimation methods and
commercial system such as Kinect 2017, we achieve the state-of-the-art
accuracy
Variational Autoencoders for Deforming 3D Mesh Models
3D geometric contents are becoming increasingly popular. In this paper, we
study the problem of analyzing deforming 3D meshes using deep neural networks.
Deforming 3D meshes are flexible to represent 3D animation sequences as well as
collections of objects of the same category, allowing diverse shapes with
large-scale non-linear deformations. We propose a novel framework which we call
mesh variational autoencoders (mesh VAE), to explore the probabilistic latent
space of 3D surfaces. The framework is easy to train, and requires very few
training examples. We also propose an extended model which allows flexibly
adjusting the significance of different latent variables by altering the prior
distribution. Extensive experiments demonstrate that our general framework is
able to learn a reasonable representation for a collection of deformable
shapes, and produce competitive results for a variety of applications,
including shape generation, shape interpolation, shape space embedding and
shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201
Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation
3D human pose estimation has been a long-standing challenge in computer
vision and graphics, where multi-view methods have significantly progressed but
are limited by the tedious calibration processes. Existing multi-view methods
are restricted to fixed camera pose and therefore lack generalization ability.
This paper presents a novel Probabilistic Triangulation module that can be
embedded in a calibrated 3D human pose estimation method, generalizing it to
uncalibration scenes. The key idea is to use a probability distribution to
model the camera pose and iteratively update the distribution from 2D features
instead of using camera pose. Specifically, We maintain a camera pose
distribution and then iteratively update this distribution by computing the
posterior probability of the camera pose through Monte Carlo sampling. This
way, the gradients can be directly back-propagated from the 3D pose estimation
to the 2D heatmap, enabling end-to-end training. Extensive experiments on
Human3.6M and CMU Panoptic demonstrate that our method outperforms other
uncalibration methods and achieves comparable results with state-of-the-art
calibration methods. Thus, our method achieves a trade-off between estimation
accuracy and generalizability. Our code is in
https://github.com/bymaths/probabilistic_triangulationComment: 9pages, 5figures, conferenc
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
Generating 3D human motion based on textual descriptions has been a research
focus in recent years. It requires the generated motion to be diverse, natural,
and conform to the textual description. Due to the complex spatio-temporal
nature of human motion and the difficulty in learning the cross-modal
relationship between text and motion, text-driven motion generation is still a
challenging problem. To address these issues, we propose \textbf{AttT2M}, a
two-stage method with multi-perspective attention mechanism: \textbf{body-part
attention} and \textbf{global-local motion-text attention}. The former focuses
on the motion embedding perspective, which means introducing a body-part
spatio-temporal encoder into VQ-VAE to learn a more expressive discrete latent
space. The latter is from the cross-modal perspective, which is used to learn
the sentence-level and word-level motion-text cross-modal relationship. The
text-driven motion is finally generated with a generative transformer.
Extensive experiments conducted on HumanML3D and KIT-ML demonstrate that our
method outperforms the current state-of-the-art works in terms of qualitative
and quantitative evaluation, and achieve fine-grained synthesis and
action2motion. Our code is in https://github.com/ZcyMonkey/AttT2MComment: IEEE International Conference on Computer Vision 2023, 9 page
Rigidity controllable as-rigid-as-possible shape deformations
Shape deformation is one of the fundamental techniques in geometric processing. One principle of deformation is to preserve the geometric details while distributing the necessary distortions uniformly. To achieve this, state-of-the-art techniques deform shapes in a locally as-rigid-as-possible (ARAP) manner. Existing ARAP deformation methods optimize rigid transformations in the 1-ring neighborhoods and maintain the consistency between adjacent pairs of rigid transformations by single overlapping edges. In this paper, we make one step further and propose to use larger local neighborhoods to enhance the consistency of adjacent rigid transformations. This is helpful to keep the geometric details better and distribute the distortions more uniformly. Moreover, the size of the expanded local neighborhoods provides an intuitive parameter to adjust physical stiffness. The larger the neighborhood is, the more rigid the material is. Based on these, we propose a novel rigidity controllable mesh deformation method where shape rigidity can be flexibly adjusted. The size of the local neighborhoods can be learned from datasets of deforming objects automatically or specified by the user, and may vary over the surface to simulate shapes composed of mixed materials. Various examples are provided to demonstrate the effectiveness of our method
- …